The COST278 Pan-European Broadcast News Database

نویسندگان

An Vandecatseye

Jean-Pierre Martens

João Paulo da Silva Neto

Hugo Meinedo

Carmen García-Mateo

Javier Dieguez-Tirado

France Mihelic

Janez Zibert

Jan Nouza

Petr David

Matús Pleva

Anton Cizmar

Harris Papageorgiou

Christina Alexandris

چکیده

This paper describes a pan-European multilingual audio and video database of broadcast news shows. The database was constructed by seven institutions that are collaborating in the European COST278 action on Spoken Language Interaction in Telecommunications. At present, the database comprises broadcast news shows in seven languages, namely Dutch, Portuguese, Galician, Czech, Slovenian, Slovakian and Greek, but the policy is to attract new partners that bring in new data which are constructed and transcribed according to the rules and procedures outlined in this paper. The data comes with evaluation software that should facilitate a comparison of experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved preprocessor for the automatic transcription of broadcast news audio stream

This paper deals with the preprocessing of the broadcast news (BN) audio stream for the automatic transcription purposes. The preprocessing consists of the automatic segmentation followed by the broad-class segment identification. The former is capable of detecting speaker and/or acoustic changes in the BN audio stream with the precision being 82.75%. The latter acts as a filter that removes no...

متن کامل

Czech-to-slovak adapted broadcast news transcription system

The first broadcast news (BN) transcription system for Slovak is introduced. It employs the same modules as the system we developed earlier for Czech. We utilize similarity between the two languages in efficient lexicon building, in mapping Slovak specific (rarely occurring) phonemes onto Czech ones and in low-resource cross-lingual adaptation of acoustic model. The system uses 166K-word lexico...

متن کامل

A Stream-based Audio Segmentation, C Pre-processing System for Broadcast

This paper describes our work on the development of a low latency stream-based audio pre-processing system for broadcast news using model-based techniques. It performs speech/nonspeech classification, speaker segmentation, speaker clustering, gender and background conditions classification. As a way to increase the modelling accuracy our algorithms make extensive use of Artificial Neural Networ...

متن کامل

Very large vocabulary speech recognition system for automatic transcription of czech broadcast programs

This paper describes the first speech recognition system capable of transcribing a wide range of spoken broadcast programs in Czech language with the OOV rate being below 3 per cent. To achieve that level we had to a) create an optimized 200k word vocabulary with multiple text and pronunciation forms, b) extract an appropriate language model from a 300M word text corpus and c) develop an own de...

متن کامل

The COST278 broadcast news segmentation and speaker clustering evaluation - overview, methodology, systems, results

This paper describes a large scale experiment in which eight research institutions have tested their audio partitioning and labeling algorithms on the same data, a multi-lingual database of news broadcasts, using the same evaluation tools and protocols. The experiments have provide more insight in the cross-lingual robustness of the methods and they have demonstrated that by further collaborati...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

The COST278 Pan-European Broadcast News Database

نویسندگان

چکیده

منابع مشابه

An improved preprocessor for the automatic transcription of broadcast news audio stream

Czech-to-slovak adapted broadcast news transcription system

A Stream-based Audio Segmentation, C Pre-processing System for Broadcast

Very large vocabulary speech recognition system for automatic transcription of czech broadcast programs

The COST278 broadcast news segmentation and speaker clustering evaluation - overview, methodology, systems, results

عنوان ژورنال:

اشتراک گذاری